91 research outputs found
CNN-based fast source device identification
Source identification is an important topic in image forensics, since it
allows to trace back the origin of an image. This represents a precious
information to claim intellectual property but also to reveal the authors of
illicit materials. In this paper we address the problem of device
identification based on sensor noise and propose a fast and accurate solution
using convolutional neural networks (CNNs). Specifically, we propose a
2-channel-based CNN that learns a way of comparing camera fingerprint and image
noise at patch level. The proposed solution turns out to be much faster than
the conventional approach and to ensure an increased accuracy. This makes the
approach particularly suitable in scenarios where large databases of images are
analyzed, like over social networks. In this vein, since images uploaded on
social media usually undergo at least two compression stages, we include
investigations on double JPEG compressed images, always reporting higher
accuracy than standard approaches
Source localization and denoising: a perspective from the TDOA space
In this manuscript, we formulate the problem of denoising Time Differences of
Arrival (TDOAs) in the TDOA space, i.e. the Euclidean space spanned by TDOA
measurements. The method consists of pre-processing the TDOAs with the purpose
of reducing the measurement noise. The complete set of TDOAs (i.e., TDOAs
computed at all microphone pairs) is known to form a redundant set, which lies
on a linear subspace in the TDOA space. Noise, however, prevents TDOAs from
lying exactly on this subspace. We therefore show that TDOA denoising can be
seen as a projection operation that suppresses the component of the noise that
is orthogonal to that linear subspace. We then generalize the projection
operator also to the cases where the set of TDOAs is incomplete. We
analytically show that this operator improves the localization accuracy, and we
further confirm that via simulation.Comment: 25 pages, 9 figure
An In-Depth Study on Open-Set Camera Model Identification
Camera model identification refers to the problem of linking a picture to the
camera model used to shoot it. As this might be an enabling factor in different
forensic applications to single out possible suspects (e.g., detecting the
author of child abuse or terrorist propaganda material), many accurate camera
model attribution methods have been developed in the literature. One of their
main drawbacks, however, is the typical closed-set assumption of the problem.
This means that an investigated photograph is always assigned to one camera
model within a set of known ones present during investigation, i.e., training
time, and the fact that the picture can come from a completely unrelated camera
model during actual testing is usually ignored. Under realistic conditions, it
is not possible to assume that every picture under analysis belongs to one of
the available camera models. To deal with this issue, in this paper, we present
the first in-depth study on the possibility of solving the camera model
identification problem in open-set scenarios. Given a photograph, we aim at
detecting whether it comes from one of the known camera models of interest or
from an unknown one. We compare different feature extraction algorithms and
classifiers specially targeting open-set recognition. We also evaluate possible
open-set training protocols that can be applied along with any open-set
classifier, observing that a simple of those alternatives obtains best results.
Thorough testing on independent datasets shows that it is possible to leverage
a recently proposed convolutional neural network as feature extractor paired
with a properly trained open-set classifier aiming at solving the open-set
camera model attribution problem even to small-scale image patches, improving
over state-of-the-art available solutions.Comment: Published through IEEE Access journa
All-for-One and One-For-All: Deep learning-based feature fusion for Synthetic Speech Detection
Recent advances in deep learning and computer vision have made the synthesis
and counterfeiting of multimedia content more accessible than ever, leading to
possible threats and dangers from malicious users. In the audio field, we are
witnessing the growth of speech deepfake generation techniques, which solicit
the development of synthetic speech detection algorithms to counter possible
mischievous uses such as frauds or identity thefts. In this paper, we consider
three different feature sets proposed in the literature for the synthetic
speech detection task and present a model that fuses them, achieving overall
better performances with respect to the state-of-the-art solutions. The system
was tested on different scenarios and datasets to prove its robustness to
anti-forensic attacks and its generalization capabilities.Comment: Accepted at ECML-PKDD 2023 Workshop "Deep Learning and Multimedia
Forensics. Combating fake media and misinformation
Anti-Aliasing Add-On for Deep Prior Seismic Data Interpolation
Data interpolation is a fundamental step in any seismic processing workflow.
Among machine learning techniques recently proposed to solve data interpolation
as an inverse problem, Deep Prior paradigm aims at employing a convolutional
neural network to capture priors on the data in order to regularize the
inversion. However, this technique lacks of reconstruction precision when
interpolating highly decimated data due to the presence of aliasing. In this
work, we propose to improve Deep Prior inversion by adding a directional
Laplacian as regularization term to the problem. This regularizer drives the
optimization towards solutions that honor the slopes estimated from the
interpolated data low frequencies. We provide some numerical examples to
showcase the methodology devised in this manuscript, showing that our results
are less prone to aliasing also in presence of noisy and corrupted data
Training CNNs in Presence of JPEG Compression: Multimedia Forensics vs Computer Vision
Convolutional Neural Networks (CNNs) have proved very accurate in multiple
computer vision image classification tasks that required visual inspection in
the past (e.g., object recognition, face detection, etc.). Motivated by these
astonishing results, researchers have also started using CNNs to cope with
image forensic problems (e.g., camera model identification, tampering
detection, etc.). However, in computer vision, image classification methods
typically rely on visual cues easily detectable by human eyes. Conversely,
forensic solutions rely on almost invisible traces that are often very subtle
and lie in the fine details of the image under analysis. For this reason,
training a CNN to solve a forensic task requires some special care, as common
processing operations (e.g., resampling, compression, etc.) can strongly hinder
forensic traces. In this work, we focus on the effect that JPEG has on CNN
training considering different computer vision and forensic image
classification problems. Specifically, we consider the issues that rise from
JPEG compression and misalignment of the JPEG grid. We show that it is
necessary to consider these effects when generating a training dataset in order
to properly train a forensic detector not losing generalization capability,
whereas it is almost possible to ignore these effects for computer vision
tasks
Aligned and Non-Aligned Double JPEG Detection Using Convolutional Neural Networks
Due to the wide diffusion of JPEG coding standard, the image forensic
community has devoted significant attention to the development of double JPEG
(DJPEG) compression detectors through the years. The ability of detecting
whether an image has been compressed twice provides paramount information
toward image authenticity assessment. Given the trend recently gained by
convolutional neural networks (CNN) in many computer vision tasks, in this
paper we propose to use CNNs for aligned and non-aligned double JPEG
compression detection. In particular, we explore the capability of CNNs to
capture DJPEG artifacts directly from images. Results show that the proposed
CNN-based detectors achieve good performance even with small size images (i.e.,
64x64), outperforming state-of-the-art solutions, especially in the non-aligned
case. Besides, good results are also achieved in the commonly-recognized
challenging case in which the first quality factor is larger than the second
one.Comment: Submitted to Journal of Visual Communication and Image Representation
(first submission: March 20, 2017; second submission: August 2, 2017
On the use of Benford's law to detect GAN-generated images
The advent of Generative Adversarial Network (GAN) architectures has given
anyone the ability of generating incredibly realistic synthetic imagery. The
malicious diffusion of GAN-generated images may lead to serious social and
political consequences (e.g., fake news spreading, opinion formation, etc.). It
is therefore important to regulate the widespread distribution of synthetic
imagery by developing solutions able to detect them. In this paper, we study
the possibility of using Benford's law to discriminate GAN-generated images
from natural photographs. Benford's law describes the distribution of the most
significant digit for quantized Discrete Cosine Transform (DCT) coefficients.
Extending and generalizing this property, we show that it is possible to
extract a compact feature vector from an image. This feature vector can be fed
to an extremely simple classifier for GAN-generated image detection purpose
H4VDM: H.264 Video Device Matching
Methods that can determine if two given video sequences are captured by the
same device (e.g., mobile telephone or digital camera) can be used in many
forensics tasks. In this paper we refer to this as "video device matching". In
open-set video forensics scenarios it is easier to determine if two video
sequences were captured with the same device than identifying the specific
device. In this paper, we propose a technique for open-set video device
matching. Given two H.264 compressed video sequences, our method can determine
if they are captured by the same device, even if our method has never
encountered the device in training. We denote our proposed technique as H.264
Video Device Matching (H4VDM). H4VDM uses H.264 compression information
extracted from video sequences to make decisions. It is more robust against
artifacts that alter camera sensor fingerprints, and it can be used to analyze
relatively small fragments of the H.264 sequence. We trained and tested our
method on a publicly available video forensics dataset consisting of 35
devices, where our proposed method demonstrated good performance
- …